ETSI TS 126 091 vio.o.o {201104) 

Technical Specification 

Digital cellular telecommunications system (Phase 2+); 
Universal Mobile Telecommunications System (UMTS); 

LTE; 
Mandatory Speech Codec speech processing functions; 

Adaptive Multi-Rate (AMR) speech codec; 

Error concealment of lost frames 

(3GPP TS 26.091 version 10.0.0 Release 10) 



33i^ 





3GPP TS 26.091 version 1 0.0.0 Release 1 1 ETSI TS 1 26 091 VI 0.0.0 (201 1 -04) 



Reference 



RTS/TSGS-0426091va00 
Keywords 



GSM, LTE, UMTS 



ETSI 

650 Route des Lucioles 
F-06921 Sophia Antipolis Cedex - FRANCE 

Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 1 6 

Siret N ° 348 623 562 0001 7 - NAF 742 C 
Association a but non lucratif enregistree a la 
Sous-Prefecture de Grasse (06) N° 7803/88 



Important notice 



Individual copies of the present document can be downloaded from: 
http://www.etsi.orq 

The present document may be made available in more than one electronic version or in print. In any case of existing or 

perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). 

In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive 

within ETSI Secretariat. 

Users of the present document should be aware that the document may be subject to revision or change of status. 

Information on the current status of this and other ETSI documents is available at 

http://portal.etsi.orq/tb/status/status.asp 

If you find errors in the present document, please send your comment to one of the following services: 

http://portal.etsi.orq/chaircor/ETSI support.asp 

Copyright Notification 

No part may be reproduced except as authorized by written permission. 
The copyright and the foregoing restriction extend to reproduction in all media. 

© European Telecommunications Standards Institute 201 1 . 
All rights reserved. 

DECT™, PLUGTESTS™, UMTS™, TIPHON™, the TIPHON logo and the ETSI logo are Trade Marks of ETSI registered 

for the benefit of its Members. 
3GPP™ is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. 

LTE™ is a Trade Mark of ETSI currently being registered 

for the benefit of its Members and of the 3GPP Organizational Partners. 

GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association. 



ETSI 



3GPP TS 26.091 version 1 0.0.0 Release 1 2 ETSI TS 1 26 091 VI 0.0.0 (201 1 -04) 



Intellectual Property Rights 



IPRs essential or potentially essential to the present document may have been declared to ETSI. The information 
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found 
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in 
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web 
server ( http://webapp.etsi.org/IPR/home.asp ). 

Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee 
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web 
server) which are, or may be, or may become, essential to the present document. 



Foreword 

This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). 

The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or 
GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. 

The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under 
http://webapp.etsi.org/kev/quervform.asp . 



ETSI 



3GPP TS 26.091 version 1 0.0.0 Release 1 3 ETSI TS 1 26 091 VI 0.0.0 (201 1 -04) 



Contents 



Intellectual Property Rights 2 

Foreword 2 

Foreword 4 

1 Scope 5 

2 References 5 

3 Definitions and abbreviations 5 

3.1 Definitions 5 

3.2 Abbreviations 5 

4 General 6 

5 Requirements 6 

5.1 Error detection 6 

5.2 Lost speech frames 6 

5.3 First lost SID frame 6 

5.4 Subsequent lost SID frames 6 

6 Example ECU/BFH Solution 1 6 

6.1 State Machine 7 

6.2 Assumed Active Speech Frame Error Concealment Unit Actions 8 

6.2.1 BFI = 0, prevBFI = 0, State = 8 

6.2.2 BFI = 0, prevBFI = 1, State = or 5 8 

6.2.3 BFI = 1, prevBFI = or 1, State = 1...6 9 

6.2.3.1 LTP-lag update 9 

6.2.3.2 Innovation sequence 9 

6.3 Assumed Non-Active Speech Signal Error Concealment Unit Actions 10 

6.3.1 General 10 

6.3.2 Detectors 10 

6.3.2.1 Background detector 10 

6.3.2.2 Voicing detector 10 

6.3.3 Background ECU Actions 10 

6.4 Substitution and muting of lost SID frames 10 

7 Example ECU/BFH Solution 2 11 

7.1 State Machine 11 

7.2 Substitution and muting of lost speech frames 11 

7.2.1 BFI = 0, prevBFI = 0, State = 11 

7.2.2 BFI = 0, prevBFI = 1, State = or 5 11 

7.2.3 BFI= 1, prevBFI = or 1, State = 1...6 12 

7.2.3.1 LTP4ag update 12 

7.2.4 Innovation sequence 12 

7.3 Substitution and muting of lost SID frames 12 

Annex A (informative): Change history 13 

History 14 



£75/ 



3GPP TS 26.091 version 1 0.0.0 Release 1 4 ETSI TS 1 26 091 VI 0.0.0 (201 1 -04) 



Foreword 



rd , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP). 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the document. 
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Scope 



The present document defines an error concealment procedure, also termed frame substitution and muting procedure, 
which shall be used by the AMR speech codec receiving end when one or more lost speech or lost Silence Descriptor 
(SID) frames are received. 

The requirements of the present document are mandatory for implementation in all networks and User Equipment (UE)s 
capable of supporting the AMR speech codec. It is not mandatory to follow the bit exact implementation outlined in the 
present document and the corresponding C source code. 



References 



The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 

[1] 3GPP TS 26.102: "AMR Speech Codec; Interface to lu snd Uu". 

[2] 3GPP TS 26.090: "Transcoding functions". 

[3] 3GPP TS 26.093: "Source Controlled Rate operation". 

[4] 3GPP TS 26.101: "Frame Structure". 



3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 

N-point median operation: consists of sorting the N elements belonging to the set for which the median operation is to 
be performed in an ascending order according to their values, and selecting the (int (N/2) + 1) -th largest value of the 
sorted set as the median value 

Further definitions of terms used in the present document can be found in the references. 

3.2 Abbreviations 

For the purposes of the present document, the following abbreviations apply: 

AN Access Network 

BFH Bad Frame Handling 

BFI Bad Frame Indication from AN 

BSI_netw Bad Sub-block Indication obtained from AN interface CRC checks 

CRC Cyclic Redundancy Check 

ECU Error Concealment Unit 

medianN N-point median operation 

PDFI Potentially Degraded Frame Indication 



£75/ 



3GPP TS 26.091 version 10.0.0 Release 10 



ETSI TS 126 091 VI 0.0.0 (2011-04) 



prevBFI Bad Frame Indication of previous frame 

RX Receive 

SCR Source Controlled Rate (operation) 

SID Silence Descriptor frame (Background descriptor) 



General 



The purpose of the error concealment procedure is to conceal the effect of lost AMR speech frames. The purpose of 
muting the output in the case of several lost frames is to indicate the breakdown of the channel to the user and to avoid 
generating possible annoying sounds as a result from the error concealment procedure. 

The network shall indicate lost speech or lost SID frames by setting the RX_TYPE values [3] to SPEECH_BAD or 
SID_BAD. If these flags are set, the speech decoder shall perform parameter substitution to conceal errors. 

The network should also indicate potentially degraded frames using the flag RX_TYPE value 

SPEECH_PROBABLY_DEGRADED. This flag may be derived from channel quality indicators. It may be used by the 
speech decoder selectively depending on the estimated signal type. 

The example solutions provided in paragraphs 6 and 7 apply only to bad frame handling on a complete speech frame 
basis. Sub-frame based error concealment may be derived using similar methods. 



Requirements 



5.1 



Error detection 



If the most sensitive bits of the AMR speech data (class A in [4]) are received in error, the network shall indicate 
RX_TYPE = SPEECH_BAD in which case the BFI flag is set. If a SID frame is received in error, the network shall 
indicate RX_TYPE = SID_BAD in which case the BFI flag is also set. The RX_TYPE = 

SPEECH_PROBABLY_DEGRADED flag should be set appropriately using quality information from the channel 
decoder, in which case the PDFI flag is set. 



5.2 Lost speech frames 



Normal decoding of lost speech frames would result in very unpleasant noise effects. In order to improve the subjective 
quality, lost speech frames shall be substituted with either a repetition or an extrapolation of the previous good speech 
frame(s). This substitution is done so that it gradually will decrease the output level, resulting in silence at the output. 
Clauses 6, and 7 provide example solutions. 



5.3 



First lost SID frame 



A lost SID frame shall be substituted by using the SID information from earlier received valid SID frames and the 
procedure for valid SID frames be applied as described in [3]. 



5.4 Subsequent lost SID frames 



For many subsequent lost SID frames, a muting technique shall be applied to the comfort noise that will gradually 
decrease the output level. For subsequent lost SID frames, the muting of the output shall be maintained. Clauses 6 and 7 
provide example solutions. 



Example ECU/BFH Solution 1 



The C code of the following example is embedded in the bit exact software of the codec. In the code the ECU is 
designed to allow subframe-by-subframe synthesis, thereby reducing the speech synthesis delay to a minimum. 
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6.1 State Machine 

This example solution for substitution and muting is based on a state machine with seven states (Figure 1). 

The system starts in state 0. Each time a bad frame is detected, the state counter is incremented by one and is saturated 
when it reaches 6. Each time a good speech frame is detected, the state counter is reset to zero, except when we are in 
state 6, where we set the state counter to 5. The state indicates the quality of the channel: the larger the value of the state 
counter, the worse the channel quality is. The control flow of the state machine can be described by the following C 
code (BFI = bad frame indicator. State = state variable): 

if(BFI != ) 

State = State + 1 ; 
else if(State == 6) 

State = 5; 
else 

State = 0; 
if(State > 6 ) 

State = 6; 

In addition to this state machine, the Bad Frame Flag from the previous frame is checked (prevBFI). The processing 
depends on the value of the State- variable. In states and 5, the processing depends also on the two flags BFI and 
prevBFI. 
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The procedure can be described as follows: 



STATE = 

BFI = 
PrevBFI = or 1 



J 



/\/\f\ A/\ 



STATE = 1 
BFI = 1 I— ' 

PrevBFI = 



STATE = 2 

BFI=1 
PrevBFI = 1 



I 



STATE = 3 

BFI=1 
PrevBFI = 1 



STATE = 4 

BFI=1 
PrevBFI = 1 



STATE = 5 

BFI = or 1 
PrevBH = 1 



J. 



STATE = 6 

BFI=1 
PrevBFI = or 1 



u 



^ Bad frame (BFI=1) 
-^ Good frame (BFI=0) 



Figure 1 : State machine for controlling the bad frame substitution 

6.2 Assumed Active Speech Frame Error Concealment Unit 
Actions 



6.2.1 BFI = 0, prevBFI = 0, State = 



No error is detected in the received or in the previous received speech frame. The received speech parameters are used 
in the normal way in the speech synthesis. The current frame of speech parameters is saved. 



6.2.2 BFI = 0, prevBFI = 1 , State = or 5 



No error is detected in the received speech frame, but the previous received speech frame was bad. The LTP gain and 
fixed codebook gain are limited below the values used for the last received good 



subframe: g 



f^^ g''<g''{-i) 
\g'{-i), g'>g'{-i) 



(1) 
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where g'' = current decoded LTP gain, ^^(—1) = LTP gain used for the last good subframe (BFI = 0), and 



\g\ g'<g'{-l) 

I /(-I), ^^>^^(-l) 



(2) 



where g''= current decoded fixed codebook gain and ^'^(— 1) = fixed codebook gain used for the last good subframe 
(BFI = 0). 

The rest of the received speech parameters are used normally in the speech synthesis. The current frame of speech 
parameters is saved. 

6.2.3 BFI = 1 , prevBFI = or 1 , State = 1 ...6 

An error is detected in the received speech frame and the substitution and muting procedure is started. The LTP gain 
and fixed codebook gain are replaced by attenuated values from the previous subframes: 

'P(state) g'i-l), g''(-l)<median5(g''(-l),...,g''(-5)) 

P(state) median5(g''(-l), . .., g'i-S)), g''(-l)> median5(g''(-l),...,g''(-5)) 

where g'' = current decoded LTP gain, g '' (—1), . . . , g '' (—n) = LTP gains used for the last n subframes, 

medianSO = 5-point median operation, P(state) = attenuation factor (P(l) = 0.98, P(2) = 0.98, P(3) = 0.8, P(4) = 0.3, 
P(5) = 0.2, P(6) = 0.2), state = state number, and 

^ ^ ICistate) g'(-l), g'i-l) < median5(g'(-l), ..., g'(-5)) 

C(state) medianSig'^ {-\), ..., ^^(-5)), ^^(-1) > median5{g''{-\), ..., ^^(-5)) 
'^ (4) 

where g''= current decoded fixed codebook gain, ^^(—1), . . . , g'^{—n) = fixed codebook gains used for the last n 
subframes, median5() = 5-point median operation, C(state) = attenuation factor (C(l) = 0.98, C(2) = 0.98, C(3) = 0.98, 
C(4) = 0.98, C(5) = 0.98, C(6) = 0.7), and state = state number. 

The higher the state value is, the more the gains are attenuated. Also the memory of the predictive fixed codebook gain 
is updated by using the average value of the past four values in the memory: 

1 '* 
ener{0) - —2_^ener{—i) 

^ •=' (5) 

The past LSFs are shifted towards their mean: 

Isf _ q\{i) = Isf _ q2{i) = a past_ lsf_ q{i) + (1 - a)mean_ lsf{i), i = 0.. .9 .^^ 

where a = 0.95, lsf_ql and lsf_q2 are two sets of LSF-vectors for current frame, past_lsf_q is lsf_q2 from the previous 
frame, and mean_lsf is the average LSF-vector. Note that two sets of LSFs are available only in the 12.2 mode. 

6.2.3.1 LTP-lag update 

The LTP-lag values are replaced by the past value from the 4th subframe of the previous frame (12.2 mode) or slightly 
modified values based on the last correctly received value (all other modes). 

6.2.3.2 Innovation sequence 

The received fixed codebook innovation pulses from the erroneous frame are used in the state in which they were 
received when corrupted data are received . In the case when no data were received random fixed codebook indices 
should be employed. 
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6.3 Assumed Non-Active Speech Signal Error Concealment 
Unit Actions 

6.3.1 General 

The Non- Active Speech ECU is used to reduce the negative impact of ampHtude variations and tonal artefacts when 
using the conventional Active Speech ECU in non-voiced signals such as background noise and unvoiced speech. The 
background ECU actions are only used for the lower rate Speech Coding modes. 

The Non- Active Speech ECU actions are done as postprocessing actions of the Active Speech ECU, actions thus 
ensuring that the Active Speech ECU states are continuously updated. This will guarantee instant and seamless 
switching to the Active Speech ECU. The detectors and state updates have to be running continuously for all speech 
coding modes to avoid switching problems. 

Only the differences to the Active Speech ECU are stated below. 

6.3.2 Detectors 

6.3.2.1 Background detector 

An energy level and energy change detector is used to monitor the signal. If the signal is considered to contain 
background noise and only shows minor energy level changes, a flag is set. The resulting indicator is the 
inBackgroundNoise flag which indicates the signal state of the previous frame. 

6.3.2.2 Voicing detector 

The received LTP gain is monitored and used to prevent the use of the background ECU actions in possibly voiced 
segments. A median filtered LTP gain value with a varying filter memory length is thresholded to provide the correct 
voicing decision. Additionally, a counter voicedHangover is used to monitor the time since a frame was presumably 
voiced. 

6.3.3 Background ECU Actions 

The BFI, and DEI indications are used together with the flag inBackgroundNoise and the counter voicedHangover to 

adjust the LTP part and the innovation part of the excitation. The actions are only taken if the previous frame has been 
classified as background noise and sufficient time has passed since the last voiced frame was detected. 

The background ECU actions are: energy control of the excitation signal, relaxed LTP lag control, stronger limitation 
of the LTP gain, adjusted adaptation of the Gain-Contour-Smoothing algorithm and modified adaptation of the Anti- 
Sparseness Procedure. 

6.4 Substitution and muting of lost SID frames 

In the speech decoder a single frame classified as SID_BAD shall be substituted by the last valid SID frame information 
and the procedure for valid SID frames be applied. If the time between SID information updates (updates are specified 
by SID_UPDATE arrivals and occasionally by SID_FIRST arrivals see 06.92) is greater than one second this shall lead 
to attenuation. 
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7 Example ECU/BFH Solution 2 

This is an alternative example solution which is a simplified version of Example ECU/BFH Solution 1 . 

7.1 State Machine 

This example solution for substitution and muting is based on a state machine with seven states (Figure 1, same state 
machine as in Example 1). 

The system starts in state 0. Each time a bad frame is detected, the state counter is incremented by one and is saturated 
when it reaches 6. Each time a good speech frame is detected, the state counter is reset to zero, except when we are in 
state 6, where we set the state counter to 5. The state indicates the quality of the channel: the larger the state counter, the 
worse the channel quality is. The control flow of the state machine can be described by the following C code (BFI = 
bad frame indicator. State = state variable): 

if(BFI != ) 

State = State + 1 ; 
else if(State == 6) 

State = 5; 
else 

State = 0; 
if(State > 6 ) 

State = 6; 

In addition to this state machine, the Bad Frame Flag from the previous frame is checked (prevBFI). The processing 
depends on the value of the State- variable. In states and 5, the processing depends also on the two flags BFI and 
prevBFI. 

7.2 Substitution and muting of lost speech frames 

7.2.1 BFI = 0, prevBFI = 0, State = 

No error is detected in the received or in the previous received speech frame. The received speech parameters are used 
normally in the speech synthesis. The current frame of speech parameters is saved. 

7.2.2 BFI = 0, prevBFI = 1 , State = or 5 

No error is detected in the received speech frame but the previous received speech frame was bad. The LTP gain and 
fixed codebook gain are limited below the values used for the last received good subframe: 

,\g', g'^g'i-^) 

^ V(-i), g'yg'^-^) (7) 

where g'' = current decoded LTP gain, ^''(— 1) = LTP gain used for the last good subframe (BFI = 0), and 



\g\ g'<g'{-l) 

I /(-I), g'>g\-\) 



(8) 



where g"^ = current decoded fixed codebook-gain and ^^(—1) = fixed codebook gain used for the last good subframe 
(BFI = 0). 

The rest of the received speech parameters are used normally in the speech synthesis. The current frame of speech 
parameters is saved. 
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7.2.3 BFI = 1 , prevBFI = or 1 , State = 1 ...6 

An error is detected in the received speech frame and the substitution and muting procedure is started. The LTP gain 
and fixed codebook gain are replaced by attenuated values from the previous subframes: 



Pistate) g'i-l), g'{-\) < medianSig^i-l), ..., ^"(-5)) 

P(state) median5(g''(-l),...,g''(-5)), g''(-l)> median5(g''(-l),...,g''(-5)) 



(9) 



where g'' = current decoded LTP gain, g '' (—1), . . ., g '' (—n) = LTP gains used for the last n subframes, 

medianSO = 5-point median operation, P(state) = attenuation factor (P(l) = 0.98, P(2) = 0.98, P(3) = 0.8, P(4) = 0.3, 
P(5) = 0.2, P(6) = 0.2), state = state number, and 



I C(state) g'(-l), g'(-l) < median5(g'(-l), ..., g'(-5)) 

C(state) medianSig^i-l), ..., g^iS)), ^^(-1) > median5(g'^(-l), ..., ^^(-5)) 



(10) 



where g'^= current decoded fixed codebook gain, ^^(—1), . . . , g'^(—n) = fixed codebook gains used for the last n 
subframes, median5() = 5-point median operation, C(state) = attenuation factor (C(l) = 0.98, C(2) = 0.98, C(3) = 0.98, 
C(4) = 0.98, C(5) = 0.98, C(6) = 0.7), and state = state number. 

The higher the state value is, the more the gains are attenuated. Also the memory of the predictive fixed codebook gain 
is updated by using the average value of the past four values in the memory: 

1 '* 
ener{0) - — /ener{—i) 

^ '=' (11) 

The past LSFs are used by shifting their values towards their mean: 

lsf_ ql{i) = Isf _ q2{i) = a past_ lsf_ q{i) + (1 - a)mean_ lsf{i), i = 0.. .9 ,,2\ 

where a = 0.95, lsf_ql and lsf_q2 are two sets of LSF-vectors for current frame, past_lsf_q is lsf_q2 from the previous 
frame, and mean_lsfis the average LSF-vector. Note that two sets of LSFs are available only in the 12.2 mode. 

7.2.3.1 LTP-lag update 

The LTP-lag values are replaced by the past value from the 4th subframe of the previous frame (12.2 mode) or slightly 
modified values based on the last correctly received value (all other modes). 

7.2.4 Innovation sequence 

The received fixed codebook innovation pulses from the erroneous frame are used in the state in which they were 
received when corrupted data are received. In the case when no data were received random fixed codebook indices 
should be employed. 

7.3 Substitution and muting of lost SID frames 

In the speech decoder a single frame classified as SID_BAD shall be substituted by the last valid SID frame information 
and the procedure for valid SID frames be applied. If the time between SID information updates (updates are specified 
by SID_UPDATE arrivals and occasionally by SID_FIRST arrivals) is greater than one second this shall lead to 
attenuation. 
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